Memory-based Robust Interpretation of Recognised Speech
نویسندگان
چکیده
We describe a series of experiments in which memorybased machine learning techniques are used for the interpretation of spoken user input in human-machine interactions. In these experiments, the task is to determine the dialogue act of the user input and the type of information slots the user fills, on the basis of a variety of features representing the spoken input (speech measurements and word recognition information) as well as its context (the interaction history). In the first experiment, we perform this task using the complete word graph output of the automatic speech recogniser. This yields an overall accuracy of 76.2%, with an F-score of 91.3 on dialogue act classification and an F-score of 87.7 on filled slot types. In the second experiment, we investigate the usefulness of two approaches to filtering out possibly non-contributing word recognition information from the speech recogniser output: (i) filtering out disfluencies, and (ii) keeping only syntactic chunk heads.
منابع مشابه
Interpreting Multilinear Representations in Speech
This paper discusses the interpretation of multilinear representations of speech utterances using a computational linguistic model. The model uses a feature-based finite state automaton representation of phonotactic constraints and axioms of event logic to provide a multilinear representation with a temporal interpretation. The asynchronous nature of the features in multilinear representations ...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملPhonological similarity and the irrelevant speech effect: implications for models of short-term verbal memory.
Three experiments studied the interaction between irrelevant speech and phonological similarity within both the remembered and the irrelevant auditory material. Phonological similarity within the remembered list impaired performance in both baseline and irrelevant speech conditions, whereas phonological similarity between the remembered and ignored irrelevant items did not influence performance...
متن کاملRelationship between Working Memory, Auditory Perception and Speech Intelligibility in Cochlear Implanted Children of Elementary School
Objectives: This study examined the relationship between working and short-term memory performance, and their effects on cochlear implant outcomes (speech perception and speech production) in cochlear implanted children aged 7-13 years. The study also compared the memory performance of cochlear implanted children with their normal hearing peers. Methods: Thirty-one cochlear impl...
متن کامل